Data Distribution Management in Large-Scale Distributed Environments
نویسنده
چکیده
Data Distribution Management (DDM) deals with two basic problems: how to distribute data generated at the application layer among underlying nodes in a distributed system and how to retrieve data back whenever it is necessary. This thesis explores DDM in two different network environments: peer-to-peer (P2P) overlay networks and cluster-based network environments. DDM in P2P overlay networks is considered a more complete concept of building and maintaining a P2P overlay architecture than a simple data fetching scheme, and is closely related to the more commonly known associative searching or queries. DDM in the cluster-based network environment is one of the important services provided by the simulation middle-ware to support real-time distributed interactive simulations. The only common feature shared by DDM in both environments is that they are all built to provide data indexing service. Because of these fundamental differences, we have designed and developed a novel distributed data structure, Hierarchically Distributed Tree (HD Tree), to support range queries in P2P overlay networks. All the relevant problems of a distributed data structure, including the scalability, self-organizing, fault-tolerance, and load balancing have been studied. Both theoretical analysis and experimental results show that the HD Tree is able to give a complete view of system states when processing multi-dimensional range queries at different levels of selectivity and in various error-prone routing environments. On the other hand, a novel DDM scheme, Adaptive Grid-based DDM scheme, is proposed to improve the DDM performance in the cluster-based network environment. This new DDM scheme evaluates the input size of a simulation based on probability models. The optimum DDM performance is best approached by adapting the simulation running in a mode that is most appropriate to the size of the simulation. List of Publications Related to Thesis Refereed Journal Papers • Yunfeng Gu, A. Boukerche: "HD Tree: A Novel Data Structure to Support MultiDimensional Range Query for P2P Networks". Journal of Parallel and Distributed Computing, Volume 71, Issue 8 (Aug. 2011), pp. 1111-1124 • Yunfeng Gu, A. Boukerche: "An Efficient Adaptive Transmission Control Scheme for Large-scale Distributed Simulation System". Parallel and Distributed Systems, IEEE Transactions on, Volume 20, Issue 2 (Feb 2009), pp. 246-260 • Yunfeng Gu, A. Boukerche, and Regina B. Araujo: "Performance Analysis of An Adaptive Dynamic Grid-based Approach to Data Distribution Management". Journal of Parallel and Distributed Computing, Vol. 68, Issue 4 (Apr. 2008), pp. 536-547 • A. Boukerche, Yunfeng Gu, and Regina B. Araujo: "An Adaptive Dynamic GridBased approach to DDM for Large-scale Distributed Simulation Systems". Journal of Computer and System Sciences, Volume 74 (Sep. 2008), pp. 1043-1054 Refereed Conference papers • Yunfeng Gu, A. Boukerche: "Error-Resilient Routing for Supporting Multidimensional Range Query in HD Tree". Distributed Simulation and Real Time Applications (DS-RT’11), 2011 IEEE/ACM 15th International Symposium on, 4 7 Sep. 2011, MediaCITYUK, Salford, UK. pp. 44-51 • Yunfeng Gu, A. Boukerche: "Hierarchically Distributed Tree". Computers and Communications (ISCC’11), 2011 IEEE 16th Symposium on, 28 Jun. 1 Jul. 2011, Kerkyra, Greece. pp. 91-96
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملAn Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity
The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملDynamic Grid-Based Interest Management for Distributed Virtual Environments
In large-scale distributed virtual environments, simulated entities maintain a consistent virtual world by exchanging messages about their state information in a timely fashion. The problem of Interest Management (IM) or, as it sometimes appears in the literature, Data Distribution Management (DDM), is the problem of delivering to each entity only the data updates that it absolutely requires to...
متن کاملAccess control in ultra-large-scale systems using a data-centric middleware
The primary characteristic of an Ultra-Large-Scale (ULS) system is ultra-large size on any related dimension. A ULS system is generally considered as a system-of-systems with heterogeneous nodes and autonomous domains. As the size of a system-of-systems grows, and interoperability demand between sub-systems is increased, achieving more scalable and dynamic access control system becomes an im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011